148 research outputs found

    MonetDB & R

    Get PDF

    Best of Both Worlds – Relational Databases and Statistics

    Get PDF
    Statistics software packages and relational database systems possess considerable overlap in the area of data loading, handling, and transformation. However, only databases are mainly optimized towards high performance in this area. In this paper, we present our approach on bringing the best of these two worlds together. We integrate the analytics-optimized database MonetDB and the R environment for statistical computing in a non-obtrusive, transparent and compatible way

    Don't hold my data hostage - A case for client protocol redesign

    Get PDF
    Transferring a large amount of data from a database to a client program is a surprisingly expensive operation. The time this requires can easily dominate the query execution time for large result sets. This represents a significant hurdle for external data analysis, for example when using statistical software. In this paper, we explore and analyse the result set serialization design space. We present experimental results from a large chunk of the database market and show the inefficiencies of current approaches. We then propose a columnar serialization method that improves transmission performance by an order of magnitude

    Data Management for Data Science - Towards Embedded Analytics

    Get PDF
    The rise of Data Science has caused an influx of new usersin need of data management solutions. However, insteadof utilizing existing RDBMS solutions they are opting touse a stack of independent solutions for data storage andprocessing glued together by scripting languages. This is notbecause they do not need the functionality that an integratedRDBMS provides, but rather because existing RDBMS im-plementations do not cater to their use case. To solve theseissues, we propose a new class of data management systems:embedded analytical systems. These systems are tightlyintegrated with analytical tools, and provide fast and effi-cient access to the data stored within them. In this work,we describe the unique challenges and opportunities w.r.tworkloads, resilience and cooperation that are faced by thisnew class of systems and the steps we have taken towardsaddressing them in the DuckDB system

    Relational queries with a tensor processing unit

    Get PDF
    Tensor Processing Units are specialized hardware devices built to train and apply Machine Learning models at high speed through high-bandwidth memory and massive instruction parallelism. In this short paper, we investigate how relational operations can be translated to those devices. We present mapping of relational operators to TPU-supported TensorFlow operations and experimental results comparing with GPU and CPU implementations. Results show that while raw speeds are enticing, TPUs are unlikely to improve relational query processing for now due to a variety of issues

    Scaling column imprints using advanced vectorization

    Get PDF
    • …
    corecore